Bayesian 4

SNR 610

Model: Wallcreeper

  • Birds that have declined in the last years

  • We will look at (simulate) data from 1989 to 2005

  • We sample random quadrats each year and record (proportion of quadrats)

  • We will use a normal linear model for this

Simulate our population

\[ y_i = \beta_0 + \beta_1 year_i + \epsilon_i \]

\[ y_i = \alpha + \beta \times year_i + \epsilon_i \]

n = 16 #(years)
a = 40 #(intercept)
b = -0.7 #(slope)
sigma = 4
sigma2 = sigma^2
x = 1:16

y is proportion of quadrats

Save true values in an object

truth <- c(alpha = a, beta = b, sigma = sigma)

Simulation

y<- a+b*x + rnorm(n,0,sigma)

Plot the data

CHANGE THE X AXIS!

library(ggplot2)

df<-data.frame(x=x,y=y)
ggplot(df,aes(x=x+1989,y=y))+
  geom_point(size=2.5)+
  ylab("Prop. occupied (%)")+
  xlab("Year")+theme_classic()

Run the model in frequentist

model <- lm(y ~ x)
summary(model) 

Call:
lm(formula = y ~ x)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.7822 -1.6253 -0.2616  1.5073  6.8714 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  38.0633     1.6606  22.922 1.68e-12 ***
x            -0.6367     0.1717  -3.707  0.00234 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.167 on 14 degrees of freedom
Multiple R-squared:  0.4954,    Adjusted R-squared:  0.4593 
F-statistic: 13.74 on 1 and 14 DF,  p-value: 0.002344
lm_est <- c(coef(model), sigma = sigma(model)) 
 lm_est 
(Intercept)           x       sigma 
 38.0633267  -0.6366534   3.1666135 

Run the stan model

Download the stan package

library(rstan)

Let’s run the model

First step: Bundle the data:

Use a list to put response variable, explanatory variable and n(number of observations)

dataList <- list(y = y, x = x, n = n)

Step 2 (do not run)

Let’s break this up

# Write text file with model description in Stan language
cat(file = "model.stan", # This line is R code
"data { // This is the first line of Stan code
  int <lower = 0> n; // Define the format of all data
  vector[n] y; //. . . including the dimension of vectors
  vector[n] x; //
}
parameters { // Define format for all parameters
  real alpha;
  real beta;
  real <lower = 0> sigma; // sigma (sd) cannot be negative
}
transformed parameters {
  vector[n] mu;
  for (i in 1:n){
    mu[i] = alpha + beta * x[i]; // Calculate linear predictor
  }
}
model {
  // Priors
  alpha ~ normal(0, 100);
  beta ~ normal(0, 100);
  sigma ~ cauchy(0, 10);
  // Likelihood (could be vectorized to increase speed)
  for (i in 1:n){
    y[i] ~ normal(mu[i], sigma);
  }
}

" )

Data and parameters

cat(file = "model.stan", # This line is R code
"data { // This is the first line of Stan code
  int <lower = 0> n; // Define the format of all data
  vector[n] y; //. . . including the dimension of vectors
  vector[n] x; //
}

parameters { // Define format for all parameters
  real alpha;
  real beta;
  real <lower = 0> sigma; // sigma (sd) cannot be negative
}

Transformed parameter

transformed parameters {
  vector[n] mu;
  for (i in 1:n){
    mu[i] = alpha + beta * x[i]; // Calculate linear predictor
  }

Model

model {
  // Priors
  alpha ~ normal(0, 100);
  beta ~ normal(0, 100);
  sigma ~ cauchy(0, 10);
  // Likelihood
  for (i in 1:n){
    y[i] ~ normal(mu[i], sigma);
  }
}

All code

# Write text file with model description in Stan language
cat(file = "model.stan", # This line is R code
"data { // This is the first line of Stan code
  int <lower = 0> n; // Define the format of all data
  vector[n] y; //. . . including the dimension of vectors
  vector[n] x; //
}
parameters { // Define format for all parameters
  real alpha;
  real beta;
  real <lower = 0> sigma; // sigma (sd) cannot be negative
}
transformed parameters {
  vector[n] mu;
  for (i in 1:n){
    mu[i] = alpha + beta * x[i]; // Calculate linear predictor
  }
}
model {
  // Priors
  alpha ~ normal(0, 100);
  beta ~ normal(0, 100);
  sigma ~ cauchy(0, 10);
  // Likelihood (could be vectorized to increase speed)
  for (i in 1:n){
    y[i] ~ normal(mu[i], sigma);
  }
}

" )

Next step

Set settings and run it

# HMC settings
ni <- 3000 ; nb <- 1000 ; nc <- 4 ; nt <- 1
# Call STAN 

modelstan <- stan(file = "model.stan", data = dataList,
chains = nc, iter = ni, warmup = nb, thin = nt)

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 3e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.3 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 3000 [  0%]  (Warmup)
Chain 1: Iteration:  300 / 3000 [ 10%]  (Warmup)
Chain 1: Iteration:  600 / 3000 [ 20%]  (Warmup)
Chain 1: Iteration:  900 / 3000 [ 30%]  (Warmup)
Chain 1: Iteration: 1001 / 3000 [ 33%]  (Sampling)
Chain 1: Iteration: 1300 / 3000 [ 43%]  (Sampling)
Chain 1: Iteration: 1600 / 3000 [ 53%]  (Sampling)
Chain 1: Iteration: 1900 / 3000 [ 63%]  (Sampling)
Chain 1: Iteration: 2200 / 3000 [ 73%]  (Sampling)
Chain 1: Iteration: 2500 / 3000 [ 83%]  (Sampling)
Chain 1: Iteration: 2800 / 3000 [ 93%]  (Sampling)
Chain 1: Iteration: 3000 / 3000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.053 seconds (Warm-up)
Chain 1:                0.089 seconds (Sampling)
Chain 1:                0.142 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 5e-06 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.05 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 3000 [  0%]  (Warmup)
Chain 2: Iteration:  300 / 3000 [ 10%]  (Warmup)
Chain 2: Iteration:  600 / 3000 [ 20%]  (Warmup)
Chain 2: Iteration:  900 / 3000 [ 30%]  (Warmup)
Chain 2: Iteration: 1001 / 3000 [ 33%]  (Sampling)
Chain 2: Iteration: 1300 / 3000 [ 43%]  (Sampling)
Chain 2: Iteration: 1600 / 3000 [ 53%]  (Sampling)
Chain 2: Iteration: 1900 / 3000 [ 63%]  (Sampling)
Chain 2: Iteration: 2200 / 3000 [ 73%]  (Sampling)
Chain 2: Iteration: 2500 / 3000 [ 83%]  (Sampling)
Chain 2: Iteration: 2800 / 3000 [ 93%]  (Sampling)
Chain 2: Iteration: 3000 / 3000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 0.053 seconds (Warm-up)
Chain 2:                0.092 seconds (Sampling)
Chain 2:                0.145 seconds (Total)
Chain 2: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 6e-06 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.06 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:    1 / 3000 [  0%]  (Warmup)
Chain 3: Iteration:  300 / 3000 [ 10%]  (Warmup)
Chain 3: Iteration:  600 / 3000 [ 20%]  (Warmup)
Chain 3: Iteration:  900 / 3000 [ 30%]  (Warmup)
Chain 3: Iteration: 1001 / 3000 [ 33%]  (Sampling)
Chain 3: Iteration: 1300 / 3000 [ 43%]  (Sampling)
Chain 3: Iteration: 1600 / 3000 [ 53%]  (Sampling)
Chain 3: Iteration: 1900 / 3000 [ 63%]  (Sampling)
Chain 3: Iteration: 2200 / 3000 [ 73%]  (Sampling)
Chain 3: Iteration: 2500 / 3000 [ 83%]  (Sampling)
Chain 3: Iteration: 2800 / 3000 [ 93%]  (Sampling)
Chain 3: Iteration: 3000 / 3000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 0.048 seconds (Warm-up)
Chain 3:                0.082 seconds (Sampling)
Chain 3:                0.13 seconds (Total)
Chain 3: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 5e-06 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.05 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:    1 / 3000 [  0%]  (Warmup)
Chain 4: Iteration:  300 / 3000 [ 10%]  (Warmup)
Chain 4: Iteration:  600 / 3000 [ 20%]  (Warmup)
Chain 4: Iteration:  900 / 3000 [ 30%]  (Warmup)
Chain 4: Iteration: 1001 / 3000 [ 33%]  (Sampling)
Chain 4: Iteration: 1300 / 3000 [ 43%]  (Sampling)
Chain 4: Iteration: 1600 / 3000 [ 53%]  (Sampling)
Chain 4: Iteration: 1900 / 3000 [ 63%]  (Sampling)
Chain 4: Iteration: 2200 / 3000 [ 73%]  (Sampling)
Chain 4: Iteration: 2500 / 3000 [ 83%]  (Sampling)
Chain 4: Iteration: 2800 / 3000 [ 93%]  (Sampling)
Chain 4: Iteration: 3000 / 3000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 0.048 seconds (Warm-up)
Chain 4:                0.095 seconds (Sampling)
Chain 4:                0.143 seconds (Total)
Chain 4: 

Get results

rstan::traceplot(modelstan)

Get results

print(modelstan, dig = 2)
Inference for Stan model: anon_model.
4 chains, each with iter=3000; warmup=1000; thin=1; 
post-warmup draws per chain=2000, total post-warmup draws=8000.

         mean se_mean   sd   2.5%    25%    50%    75%  97.5% n_eff Rhat
alpha   38.03    0.04 1.83  34.44  36.84  38.02  39.22  41.60  2555    1
beta    -0.63    0.00 0.19  -1.01  -0.76  -0.63  -0.51  -0.25  2610    1
sigma    3.45    0.01 0.71   2.38   2.95   3.36   3.83   5.16  3090    1
mu[1]   37.40    0.03 1.67  34.14  36.31  37.40  38.47  40.69  2607    1
mu[2]   36.77    0.03 1.51  33.79  35.77  36.76  37.74  39.75  2682    1
mu[3]   36.13    0.03 1.36  33.44  35.24  36.13  37.01  38.84  2804    1
mu[4]   35.50    0.02 1.22  33.11  34.71  35.50  36.28  37.97  3007    1
mu[5]   34.87    0.02 1.10  32.72  34.15  34.86  35.57  37.09  3358    1
mu[6]   34.23    0.02 1.00  32.29  33.58  34.22  34.86  36.27  4002    1
mu[7]   33.60    0.01 0.92  31.75  33.00  33.59  34.18  35.50  5203    1
mu[8]   32.96    0.01 0.89  31.22  32.40  32.95  33.52  34.76  7329    1
mu[9]   32.33    0.01 0.89  30.58  31.77  32.32  32.88  34.11  9811    1
mu[10]  31.70    0.01 0.93  29.87  31.11  31.70  32.28  33.55 10078    1
mu[11]  31.06    0.01 1.00  29.09  30.42  31.07  31.70  33.07  9278    1
mu[12]  30.43    0.01 1.10  28.22  29.73  30.43  31.13  32.63  7141    1
mu[13]  29.80    0.02 1.23  27.34  29.01  29.80  30.58  32.25  5921    1
mu[14]  29.16    0.02 1.37  26.46  28.29  29.17  30.04  31.90  5108    1
mu[15]  28.53    0.02 1.52  25.52  27.55  28.53  29.51  31.55  4565    1
mu[16]  27.90    0.03 1.67  24.58  26.81  27.90  28.97  31.23  4190    1
lp__   -26.06    0.03 1.33 -29.45 -26.67 -25.72 -25.09 -24.55  2218    1

Samples were drawn using NUTS(diag_e) at Fri Nov 14 09:41:32 2025.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Get results

stan_est <- summary(modelstan)$summary[1:3,1]
comp <- cbind(truth = truth, lm = lm_est, Stan = stan_est)
print(comp, 4)
      truth      lm    Stan
alpha  40.0 38.0633 38.0321
beta   -0.7 -0.6367 -0.6334
sigma   4.0  3.1666  3.4536

Challenge